6 research outputs found
Recommended from our members
Health Effects Associated With Electronic Cigarette Use: Automated Mining of Online Forums.
BACKGROUND:Our previous infodemiological study was performed by manually mining health-effect data associated with electronic cigarettes (ECs) from online forums. Manual mining is time consuming and limits the number of posts that can be retrieved. OBJECTIVE:Our goal in this study was to automatically extract and analyze a large number (>41,000) of online forum posts related to the health effects associated with EC use between 2008 and 2015. METHODS:Data were annotated with medical concepts from the Unified Medical Language System using a modified version of the MetaMap tool. Of over 1.4 million posts, 41,216 were used to analyze symptoms (undiagnosed conditions) and disorders (physician-diagnosed terminology) associated with EC use. For each post, sentiment (positive, negative, and neutral) was also assigned. RESULTS:Symptom and disorder data were categorized into 12 organ systems or anatomical regions. Most posts on symptoms and disorders contained negative sentiment, and affected systems were similar across all years. Health effects were reported most often in the neurological, mouth and throat, and respiratory systems. The most frequently reported symptoms and disorders were headache (n=939), coughing (n=852), malaise (n=468), asthma (n=916), dehydration (n=803), and pharyngitis (n=565). In addition, users often reported linked symptoms (eg, coughing and headache). CONCLUSIONS:Online forums are a valuable repository of data that can be used to identify positive and negative health effects associated with EC use. By automating extraction of online information, we obtained more data than in our prior study, identified new symptoms and disorders associated with EC use, determined which systems are most frequently adversely affected, identified specific symptoms and disorders most commonly reported, and tracked health effects over 7 years
Recommended from our members
Analysis and Querying of Health-Related Social Media
The increased popularity of social media and the copious amount of user-generated data in the last few years have impacted various aspects of individuals’ lives. The use of social media for health care related purposes, which is the focus of this thesis, has increased exponentially. This provides the researchers with a massive volume of data that can augment traditional health-related data sources (like electronic medical records) if properly mined and analyzed. Despite the advances in text analytics, it is challenging to analyze this data, due to its specialized vocabulary, the data collection, and the missing values.In this thesis, we focus on two research directions: (a) Analyzing the demographics of users who participate in health-related social media, along with their posted content across a wide range of sources, and highlight specific health issues reported by users. (b) Effectively querying health-related social media or other health-related documents (can be generalized to the problem of querying annotated document). Specifically, in our first contribution, we study the demographics of users who participate in health-related social media, to identify possible links to health care disparities. Using these demographics, our second contribution analyzes the content of posts grouped by demographic segments by implementing information extraction methods to extract medical concepts, identify top distinctive terms, and measure sentiment and emotion. We also extend our content analysis in the third contribution by studying the intent of posts generated by users for different data sources. Lastly, we focus on a specific domain, electronic cigarettes, and analyze the health-related effects reported by online users.In the second direction of this thesis, we developed a query framework to help users efficiently explore health-related data, present in either online social media or other medical documents, by exploiting the relationships between the network users or the concepts inside the documents. Our solution is generalized to other domains with similar properties, such as general purpose social networks. We refer to this problem as keyword querying on graph-annotated documents, where we query documents annotated by interconnected entities, which are related to each other through association graphs. Our novel framework balances the importance of text relevance and semantic relevance through the graph
Analysis and Querying of Health-Related Social Media
The increased popularity of social media and the copious amount of user-generated data in the last few years have impacted various aspects of individuals’ lives. The use of social media for health care related purposes, which is the focus of this thesis, has increased exponentially. This provides the researchers with a massive volume of data that can augment traditional health-related data sources (like electronic medical records) if properly mined and analyzed. Despite the advances in text analytics, it is challenging to analyze this data, due to its specialized vocabulary, the data collection, and the missing values.In this thesis, we focus on two research directions: (a) Analyzing the demographics of users who participate in health-related social media, along with their posted content across a wide range of sources, and highlight specific health issues reported by users. (b) Effectively querying health-related social media or other health-related documents (can be generalized to the problem of querying annotated document). Specifically, in our first contribution, we study the demographics of users who participate in health-related social media, to identify possible links to health care disparities. Using these demographics, our second contribution analyzes the content of posts grouped by demographic segments by implementing information extraction methods to extract medical concepts, identify top distinctive terms, and measure sentiment and emotion. We also extend our content analysis in the third contribution by studying the intent of posts generated by users for different data sources. Lastly, we focus on a specific domain, electronic cigarettes, and analyze the health-related effects reported by online users.In the second direction of this thesis, we developed a query framework to help users efficiently explore health-related data, present in either online social media or other medical documents, by exploiting the relationships between the network users or the concepts inside the documents. Our solution is generalized to other domains with similar properties, such as general purpose social networks. We refer to this problem as keyword querying on graph-annotated documents, where we query documents annotated by interconnected entities, which are related to each other through association graphs. Our novel framework balances the importance of text relevance and semantic relevance through the graph
Querying Documents Annotated by Interconnected Entities
In a large number of applications, from biomedical literature to social networks, there are collections of text documents that are annotated by interconnected entities, which are related to each other through association graphs. For example, social posts are related through the friendship graph of their authors, and PubMed articles area annotated by Mesh terms, which are related through ontological relationships. To effectively query such collections, in addition to the text content relevance of a document, the semantic distance between the entities of a document and the query must be taken into account. In this paper, we propose a novel query framework, which we refer as keyword querying on graph-annotated documents, and query techniques to answer such queries. Our methods automatically balance the impact of the graph entities and the text content in the ranking. Our qualitative evaluation on real dataset shows that our methods improve the ranking quality compared to baseline ranking systems